Statistical Disclosure Control for Microdata Using the R-Package sdcMicro
نویسنده
چکیده
The demand for data from surveys, censuses or registers containing sensible information on people or enterprises has increased significantly over the last years. However, before data can be provided to the public or to researchers, confidentiality has to be respected for any data set possibly containing sensible information about individual units. Confidentiality can be achieved by applying statistical disclosure control (SDC) methods to the data in order to decrease the disclosure risk of data. The R package sdcMicro serves as an easy-to-handle, object-oriented S4 class implementation of SDC methods to evaluate and anonymize confidential micro-data sets. It includes all popular disclosure risk and perturbation methods. The package performs automated recalculation of frequency counts, individual and global risk measures, information loss and data utility statistics after each anonymization step. All methods are highly optimized in terms of computational costs to be able to work with large data sets. Reporting facilities that summarize the anonymization process can also be easily used by practitioners. We describe the package and demonstrate its functionality with a complex household survey test data set that has been distributed by the International Household Survey Network.
منابع مشابه
A Graphical User Interface for Microdata Protection Which Provides Reproducibility and Interactions: the sdcMicro GUI
The proposed graphical user interface (GUI) for microdata protection serves as an easyto-handle tool for users who want to use the sdcMicro package for statistical disclosure control but are not familiar with the native R command line interface. In addition to that, interactions between objects that result from the anonymization process are provided within this GUI. This allows an automated rec...
متن کاملRobust Statistics Meets SDC: New Disclosure Risk Measures for Continuous Microdata Masking
Abstract. The aim of this study is to evaluate the risk of re-identification related to distance-based disclosure risk measures for numerical variables. First, we overview different already proposed disclosure risk measures. Unfortunately, all these measures do not account for outliers. We assume that outliers must be protected more than observations near the center of the data cloud. Therefore...
متن کاملsdcMicro: a new flexible R-package for the generation of anonymised microdata: Design issues and new methods
Data protection specialists need flexible software tools for the exploratory use of protection methods to generate high quality confidential data. Microdata protection is widely used and is often the only possible way to provide data to both researchers and users. In this paper we present a methodological and computational framework for the generation of anonymised microdata and give insights t...
متن کاملWP.31 ENGLISH ONLY UNITED NATIONS STATISTICAL COMMISSION and ECONOMIC COMMISSION FOR EUROPE CONFERENCE OF EUROPEAN STATISTICIANS EUROPEAN COMMISSION STATISTICAL OFFICE OF THE EUROPEAN COMMUNITIES (EUROSTAT)
Data protection specialists need flexible software tools for the exploratory use of protection methods to generate high quality confidential data. Microdata protection is widely used and is often the only possible way to provide data to both researchers and users. In this paper we present a methodological and computational framework for the generation of anonymised microdata and give insights t...
متن کاملArgus: Software for Statistical Disclosure Control of Microdata
In recent years Statistics Netherlands has developed a prototype version of a software package, ARGUS, to protect microdata files against statistical disclosure. In 1995 the present prototype version of ARGUS, namely version 1.1, has been released. In this paper both the rules, based on checking low-dimensional combinations of values of so-called identifying variables, and the techniques, globa...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Trans. Data Privacy
دوره 1 شماره
صفحات -
تاریخ انتشار 2008